Evaluation of the Highest Probability SVM Nearest Neighbor Classifier with Variable Relative Error Cost
نویسندگان
چکیده
In this paper we evaluate the performance of the highest probability SVM nearest neighbor (HP-SVM-NN) classifier, which combines the ideas of the SVM and k-NN classifiers, on the task of spam filtering. To classify a sample, the HP-SVM-NN classifier does the following: for each k in a predefined set {k1, ..., kN} it trains an SVM model on k nearest labeled samples, uses this model to classify the given sample, and transforms the output of SVM into posterior probabilities of the classes using sigmoid approximation; than it selects that of the 2×N resulting answers which has the highest probability. The evaluation shows that in terms of ROC curves the algorithm is able outperform pure SVM.
منابع مشابه
Highest Probability Svm Nearest Neighbor Classifier for Spam Filtering
In this paper we evaluate the performance of the highest probability SVM nearest neighbor classifier, which is a combination of the SVM and k-NN classifiers, on a corpus of email messages. To classify a sample the algorithm performs the following actions: for each k in a predefined set {k1, ..., kN} it trains an SVM model on k nearest labelled samples, and uses this model to classify the given ...
متن کاملکاربرد الگوریتمهای دادهکاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد
Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...
متن کاملNon-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کاملDiabetes Prediction by Optimizing the Nearest Neighbor Algorithm Using Genetic Algorithm
Introduction: Diabetes or diabetes mellitus is a metabolic disorder in body when the body does not produce insulin, and produced insulin cannot function normally. The presence of various signs and symptoms of this disease makes it difficult for doctors to diagnose. Data mining allows analysis of patients’ clinical data for medical decision making. The aim of this study was to provide a model fo...
متن کاملInstance-Based Spam Filtering Using SVM Nearest Neighbor Classifier
In this paper we evaluate an instance-based spam filter based on the SVM nearest neighbor (SVM-NN) classifier, which combines the ideas of SVM and k-nearest neighbor. To label a message the classifier first finds k nearest labeled messages, and then an SVM model is trained on these k samples and used to label the unknown sample. Here we present preliminary results of the comparison of SVM-NN wi...
متن کامل